INTRODUCTION

The opioid crisis in Mesa, Arizona is a public health emergency that must be addressed. A spike in overdoses onset by the COVID-19 pandemic in early 2020 saw an increase of 76% in the number of opioid overdose incidents and a 132% increase in opioid related deaths1. The problem is not a new one, however. Due to an alarming increase in opioid deaths in 2016, Governor Ducey declared a state of emergency on June 5, 2017 and the Mesa County Opioid Response Group was established in June 2018 to reduce the community impact of opioids through prevention, treatment, and recovery services. The goal of the group is to prevent substance misuse through:

  • Education of community members & health providers
  • Improving treatment access & retention
  • Reduce harm by providing educational information and training on Naloxone and how to obtain and use the life-saving overdose reversal drug.

The state of emergency was officially terminated in May, 2018, despite the growing numbers. Figure 1, sourced from the data.mesaaz.gov, illustrates the urgency of the increasing overdose incident count, reaching a record high of 211 in September of this year.

View the data figure 1. interactive chart via data.mesaaz.gov


We believe in the power of predictive modeling to assist in the critical decision making process around the allocation of resources towards the opioid crisis. The goal of this analysis is to estimate a geospatial risk prediction model, predicting overdoses as a function of environmental factors like crime, proximity to city facilities & services, and demographic indicators. The process includes wrangling data from Mesa’s Open Data portal, and engineering them into powerful predictive features. We use a Poisson Regression, cross validated with K-fold and Leave One Group Out (LOGO) cross validation methods. We are essentially borrowing the experience of observed opioid overdose incidents of 2017, and will compare our predictions to observed opioid overdose incidents from 2018. Predictions from this models are interpreted as ‘the forecasted risk/opportunity of that event occurring here’. A series of ‘goodness of fit’ indicators will tell us how well our model is performing, and if our predictions will generalize well across time and space.

Opioid Observation Treatment Center

Our end goal with the geospatial risk model is to produce a tool that can be used by Mesa health officials to implement harm reduction strategies through proactive resource allocation. We propose an updated response program that will create facilities throughout Mesa called Opioid Observation and Treatment Centers (OOTC). An OOTC will serve two purposes:

  • Be a safe space that provides support, safe - monitoring/safe injection site, overdose treatment, and primary care for individuals who are using and would otherwise be on the street.

  • Have staff and resources prepared for emergency response to overdose incidents – cater to public and residential overdoses.

Web Application

Through this analysis, we hope to identify locations throughout Mesa that will benefit most from the creation of Opioid Observation and Treatment Centers, and provide recommendations for future intervention. We have conceptualized a web application that will give access to the automatically updating output of our machine learning model. City and healthcare officials can utilize our analysis output to properly allocate staff and resources towards creating these centers with the goal of mitigating opioid overdose death.

WebApp Wireframe


EXPLORATORY ANALYSIS

Wrangling Data

Our data sets include:
- Fire and Medical Opioid Overdose Incidents Confirmed cases of opioid overdose, locations are not actual but instead rounded to approximately 1/3 mile increments. Opioid overdose confirmed by 1) patient or witness verification, 2) Opioid found on scene or 3) positive response to Narcan treatment.
- Census ACS data demographic and socio-economic indicators at the block group level.
- City Property Data will be filtered to account for various facility type - child crisis center, fire and police station, public housing, arts and education centers.
- Crime Data Incidents based on initial police reports taken by officers when responding to calls for service. Data is modified for public use. Address and Location are not exact locations of incidents and have been rounded to nearest hundred block. Lat/Long are approximations only based on rounded hundred block. Incidents reported in this dataset may not correlate with 911 Events datasets and calls for Police service. The City of Mesa does not disclose information that is inflammatory in nature that impacts our citizens. Split by type; misdemeanors and felonies.
Park Locations And Amenities Information about location and amenities available at developed city parks, park open spaces (basins) maintained by the City of Mesa. Pools, Recreation Centers and other public gathering facilities owned and/or maintained by the City of Mesa’s Parks Department such as convention center, amphitheater and cemetery are also listed.
- Light Rail Stations This dataset provides information regarding the location of current light rail stations within the City of Mesa and each station’s name and address.
- City Boundary Map Perimeter of the city boundary.
Zoning Districs Specifically delineated geographic areas in the city within which regulations and requirements uniformly govern the use of land - separated between High density residential, low density residential, downtown, commercial, and industrial.




figure 2. illustrates the relationship of zoning districts to the opioid overdose incident points. Because the overdose incident points are rounded to the approximate 1/3 mi. interval for privacy reasons, we have aggregated the amount of overdoses at each point and visualized each point as a graduated symbol representing the count number. This is how overdose incident points will be illustrated throughout the analysis, with the exception of othe fishnet.

The Fishnet

A municipal or government boundary, be it a Census block group, or a police district, is often drawn arbitrarily or does not represent a consistent division of space. We are hoping to analyze the risk of opioid overdose evenly across space and cannot be limited by these boundaries. For this reason, we create a ‘fishnet’, or grid of cells at 1/3 mi dimensions, to which we will aggregate our point data. The risk predictions will be represented as a smooth surface. Figure 3 illustrates opioid overdose incidents from 2017, where darker pixels indicate a higher count of incidents.

figures 4.1 - 4.4 follow the same process, but with our independent variables wrangled from our data sets.







Nearest Neighbor

Figures 4.5 and 4.6 are a representation of the nearest neighbor function used on our City property and census data. The nearest neighbor function calculates the average distance of each fishnet cell to the closest 3 points of the variable. this creates a gradiated map because the value of each fishnet cell takes on the calculated average distance. We will determine whether we will use the unaggregated or nearest neighbor version of these variables for the final poisson regression through analyzing the correlation of each independent variable to the dependent variable later in the analysis. Our next step is to explore the spatial process of opioid overdoses in Mesa.



Local Moran’s I

To test the spatial process, we us a statistic called Local Moran’s I. Here, the null hypothesis is that the overdose count at a given location is randomly distributed relative to its immediate neighbors. We create a neighbor list and a spatial weights index using queen contiguity. Our output, figure 5.1, depicts where there is relatively high local clustering of overdose incidents. We see this clustering also represented in the spatial process of the p-value, where lighter pixel colors represent a lower p-value. If our null hypothesis states that there is no significant local clustering, a p-value <= 0.05 indicates the areas in which we can reject the null hypothesis in favor of the alternative hypothesis, that there is in fact local clustering. The significant hot spots are created from the fishnet pixels that have a p-value <= 0.05.

Figure 5.2 illustrates the distance function used on our significant hot spot output, thus creating a value for each fishnet cell that represents the distance between the centroid of the cell to the nearest significant cluster. This value will be used as a predictive feature in our poisson regression model.


Variable Correlation

For the final step of our exploratory analysis, we plot the Pearson correlation of each independent variable and our dependent variable. If the Pearson coefficient (the r value displayed in the top left corner of each individual scatter plot) is close to 1, we have a strong positive correlation. Similarly, if the r value is close to -1, we have a strong negative correlation. The closer the r value is to 0, the weaker the correlation between our variables. While we can see there is no strong positive or negative correlation between any predictor and the dependent variable, this step helps us to determine which variables should be used in our regression model. For example, we see Low.Income has a positive correlation of 0.33, while the nearest neighbor version of this variable, low_inc.nn has an almost equal negative correlation of -0.24. We choose Low.Income to use in our model because change in demographic indicators such as income levels or race is not consistent with distance. Proximity does not account for “the other side of the tracks” phenomenon where social and economic status is contained within a geographic boundary. And we know low income neighborhoods often face more drug related issues, so we expect a positive correlation.




POISSON REGRESSION

The histogram depicting the distribution of overdose occurrences in 2017 in figure 7 tell us how heavily skewed towards 0 our data is. This is understandable – as the vast majority of our fishnet cells will have no overdose occurrences within them. The Poisson distribution most similarly resembles our overdose distribution.



The variables we choose to use in our regression are as follows:

Commercial, High.Density.Residential, Low.Density.Residential, Downtown, Park.nn, Arts.and.Education.nn, Police.Fire.nn, Public.Housing, Vacant.Property, Low.Income, Low.Rent, Majority.White, Majority.Hispanic,Child.Crisis.Center, Industrial, misdemeanor, felony,lightrail, overdose.isSig.dist, overdose.isSig


Cross Validation

While an accurate model is very important, we are more concerned with generalizability for the geospatial risk model. This will tell us how well our model performs on seeing new data and its ability to recognize spatial group contexts, like demographic and racial differences across neighborhoods. Our methods for cross-validating our data include k-fold cross validation(CV) and leave one group out cross validation (LOGO-CV). K-fold takes a group of our fishnet cells, trains our model on the remaining cells, tests the prediction on said group, records a goodness of fit metric, and moves onto the next group of cells until every cell has been trained and tested on. LOGO-CV has a similar process, except we are using the Census Tract GEOID as the geographic area we exclude and test on. LOGO-CV should ideally use neighborhood geographic boundaries because Census Tracts are often arbitrary, but we were limited by the data available to us. We are also looking to see how each cross validation method performs with and without spatial context by including our Moran’s I significance indicator. Therefore, we are running four iterations of our cross validation:

1. K-fold CV - Just risk factors
2. K-fold CV - Spatial Process
3. LOGO-CV - Just risk factors
4. LOGO-CV - Spatial Process

The hope here is to confirm that the inclusion of the spatial process is improving our model. Figures 8 and 9 visualize the mean absolute errors (MAE) made by our model. This histograms in figure 8 are showing us the distribution of the MAE for each fold made by each regression iteration. And as we can see in the MAE table for each regression below, our average error, while marginal, decreases for our LOGO-CV when the spatial process is accounted for, confirming our expectation.




Regression Mean_MAE SD_MAE
Random k-fold CV: Just Risk Factors 0.08 0.07
Random k-fold CV: Spatial Process 0.09 0.07
Spatial LOGO-CV: Just Risk Factors 0.22 0.29
Spatial LOGO-CV: Spatial Process 0.20 0.27


Regression Morans_I p_value
Spatial LOGO-CV: Just Risk Factors 0.1043112 0.023
Spatial LOGO-CV: Spatial Process 0.1105592 0.023



Regression Majority_Hispanic Majority_Non_Hispanic
Random k-fold CV: Just Risk Factors 0.0029079 0.0028854
Random k-fold CV: Spatial Process -0.0135207 0.0088243
Spatial LOGO-CV: Just Risk Factors -0.0031414 0.0005293
Spatial LOGO-CV: Spatial Process -0.0257327 0.0023287




CONCLUSION

Finally, with our model’s predictions, we are able to spatially join the commercial and downtown zoned areas to the fishnet cells with the highest prediction count (figure 16.1). The result, depicted in figure 17 and figure 18, shows us which blocks in the city of Mesa will benefit most from the intervention of an OOTC. The darker color indicates which commercial zoning districts intersect with where our model predicted the highest amount of overdose incidents. The blue in surrounding areas are residential zoning districts with high prediction counts that should be a secondary option to the commercial and downtown districts.


figure 18.1 figure 18.2

Our model is performing well with a relatively low median average error, but of course there is always opportunity to improve our analysis. Our predictions are limited in accuracy due to the fact that we do not have access to exact location of observed opioid overdose incidents. With permission from the city to move forward on this project, we would respectively request unaggregated point data for the opioid overdose incidents to train our model. We would also request point data for existing health care facilities and schools, which would provide our model with valuable predictive features, and we would expect this data to improve our results.

Code Appendix

#SETUP
knitr::opts_chunk$set(warning=FALSE)

library(tidyverse)
library(sf)
library(RSocrata)
library(ggmap)
library(viridis)
library(spatstat)
library(raster)
library(spdep)
library(FNN)
library(grid)
library(gridExtra)
library(knitr)
library(kableExtra)
library(tidycensus)
library(RColorBrewer)
library(ggthemes)
library(sp)
library(rgeos)
library(maptools)
library(dplyr)
library(units)
library(geojsonio)
library(cowplot)
library(ggcorrplot)

options(scipen=999)
options(tigris_class = "sf")

census_api_key("d5e25f48aa48bf3f0766baab06d59402ea032067", overwrite = TRUE) 

#LOAD FUNCTIONS

source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")

#change cross validate function
crossValidate<- function(dataset, id, dependentVariable, indVariables) {
  
  allPredictions <- data.frame()
  cvID_list <- unique(dataset[[id]])
  
  for (i in cvID_list) {
    
    thisFold <- i
    cat("This hold out fold is", thisFold, "\n")
    
    fold.train <- filter(dataset, dataset[[id]] != thisFold) %>% as.data.frame() %>% 
      dplyr::select(id, geometry, indVariables, dependentVariable)
    fold.test  <- filter(dataset, dataset[[id]] == thisFold) %>% as.data.frame() %>% 
      dplyr::select(id, geometry, indVariables, dependentVariable)
    
    regression <-
      glm(countoverdose ~ ., family = "poisson", 
          data = fold.train %>% 
            dplyr::select(-geometry, -id))
    
    thisPrediction <- 
      mutate(fold.test, Prediction = predict(regression, fold.test, type = "response"))
    
    allPredictions <-
      rbind(allPredictions, thisPrediction)
    
  }
  return(st_sf(allPredictions))
}


mapTheme<- function(base_size = 12, title_size = 16) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = title_size,colour = "black"),
    plot.subtitle=element_text(face="italic"),
    plot.caption=element_text(hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),axis.title = element_blank(),
    axis.text = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    strip.text.x = element_text(size = 14),
    legend.key = element_rect(fill=NA))
}

g_legend<-function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)
  }


#----CENSUS TRACTS, BLOCKGROUPS AND CITY BOUNDARY----

#CENSUS TRACTS 
tracts17 <- get_acs(geography = "tract", variables = c("B25026_001","B19013_001","B25058_001",
                                                       "B06012_002", "B25003_003", "B25003_002", "B25004_001" ), 
                    year=2017, state=04, county=013, geometry=T) %>% 
  st_transform('EPSG:2224')%>%
  filter(GEOID != '04013010102')%>%
  filter(GEOID != '04013941300')%>%
  filter(GEOID != '04013422308')%>%
  filter(GEOID != '04013422307')%>%
  filter(GEOID != '04013422508')%>%
  filter(GEOID != '04013422506')%>%
  filter(GEOID != '04013422622')%>%
  filter(GEOID != '04013318400')%>%
  filter(GEOID != '04013420100')%>%
  filter(GEOID != '04013810300')%>%
  filter(GEOID != '04013523104')%>%
  filter(GEOID != '04013420110')%>%
  filter(GEOID != '04013816900')%>%
  filter(GEOID != '04013815902')%>%
  filter(GEOID != '04013810800')%>%
  filter(GEOID != '04013815600')%>%
  filter(GEOID != '04013815200')%>%
  filter(GEOID != '04013319904')%>%
  filter(GEOID != '04013319402')%>%
  filter(GEOID != '04013319404')%>%
  filter(GEOID !="04013319906")

#CITY BOUNDARY
city_boundary <- st_as_sf(st_read("https://raw.githubusercontent.com/OliviaScalora/MUSA508_Final/main/Data/City_Boundary.csv"), 
                          wkt = 'Geometry', crs = 4326, agr = 'constant')%>%
  st_transform(st_crs(tracts17))

#filter tracts within city boundary
mesa_tracts17 <- tracts17[city_boundary,]%>%
  dplyr::select( -NAME, -moe) %>%
  spread(variable, estimate) %>%
  dplyr::select(-geometry) %>%
  rename(TotalPop = B25026_001, 
         MedHHInc = B19013_001, 
         MedRent = B25058_001,
         TotalPoverty = B06012_002,
         TotalRent = B25003_003,
         TotalOwn = B25003_002,
         VacantUnits = B25004_001)

#BLOCK GROUPS
blockgroups17 <- get_acs(geography = "block group", variables = c("B01003_001","B19013_001","B25058_001",
                                                                  "B25003_003", "B25003_002", "B25004_001", "B25001_001",
                                                                  "B02001_002", "B03002_012"), 
                         year=2017, state=04, county=013, geometry=T) %>% st_transform('EPSG:2224')

mesa_bg17 <- blockgroups17[city_boundary,]%>%
  dplyr::select( -NAME, -moe) %>%
  spread(variable, estimate) %>%
  dplyr::select(-geometry) %>%
  rename(TotalPop = B01003_001, 
         MedHHInc = B19013_001, 
         MedRent = B25058_001,
         TotalRent = B25003_003,
         TotalOwn = B25003_002,
         VacantUnits = B25004_001,
         TotalUnits = B25001_001,
         TotalWhite = B02001_002,
         HispTotal = B03002_012)%>% 
  mutate(area = st_area(geometry),
         Pct.Vacant = VacantUnits/TotalUnits,
         Pct.White = TotalWhite/TotalPop,
         Pct.Hisp = HispTotal/TotalPop)%>%
  filter(GEOID != '040130101021')%>%
  filter(GEOID != '1040133184002')%>%
  filter(GEOID != '040139413002')%>%
  filter(GEOID != '040138169001')%>%
  filter(GEOID != '040139413004')%>%
  filter(GEOID != '040139413003')%>%
  filter(GEOID != '040133184002')

#----CENSUS VARIABLES----

#block groups with household income in the bottom quintile= low_income 
mesa_bg17_HHInc <-mesa_bg17 %>% drop_na(MedHHInc)%>%mutate(HHInc_q5 = q5(MedHHInc))
low_inc_BG <- mesa_bg17_HHInc%>%filter(HHInc_q5 == 1)%>%dplyr::select(GEOID,geometry,)%>%mutate(Legend = 'Low Income')

#block groups where median rent is in lower 2 quintile
mesa_bg17_rent <-mesa_bg17 %>% drop_na(MedRent)%>%mutate(MedRent_q5 = as.numeric(q5(MedRent)))
low_med_rent_BG <- mesa_bg17_rent%>%filter(MedRent_q5 <=2)%>%dplyr::select(GEOID,geometry,)%>%mutate(Legend = 'Low Rent')

#block groups where vacancy rate is higher than 25%
high_vacancy_bg <- mesa_bg17%>%filter(Pct.Vacant >= .25)%>%dplyr::select(GEOID,geometry,)%>%mutate(Legend = 'High Vacancy')

#block groups where majority hispanic
hisp_bg <- mesa_bg17%>%filter(Pct.Hisp >= .5)%>%dplyr::select(GEOID,geometry,)%>%mutate(Legend = 'Majority Hispanic')

#block groups where vast majority white
white_bg <- mesa_bg17%>%filter(Pct.White >= .85)%>%dplyr::select(GEOID,geometry,)%>%mutate(Legend = 'Majority White')


#----OPIOID DATA----
opioid_data <- st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/qufy-tzv6.json")), 
                        coords = c("longitude", "latitude"), 
                        crs = 4326, agr = "constant")%>%
               st_transform(st_crs(tracts17))%>%filter(location.longitude != -112.225)
opioid_data <- opioid_data%>%mutate(point = paste(opioid_data$location.latitude, opioid_data$location.longitude))

#count number of overdose occurances at each 1/3 mile interval point and join count column to opioid data set
count <- count(opioid_data,point)
opioid_data <- (st_join(opioid_data, count, all.x = all)%>%dplyr::select(-point.x,-point.y)%>%
  rename(count = n))

#filter points that are within city boundary
opioid_data <- opioid_data[city_boundary,]

opioid_data17 <- opioid_data%>%filter(year=='2017')


#----TRANSPORTATION----

light_rail <- st_read("data/Lightrail/LightRailStation.shp")%>%
  st_transform(st_crs(tracts17))%>%mutate(ID = 1:n(), Legend = "lightrail")%>%
  dplyr::select(ID, Legend,geometry)%>%st_centroid()
light_rail <- light_rail [city_boundary,]


 #----CRIME DATA----


crime <- read.socrata("https://data.mesaaz.gov/resource/39rt-2rfj.csv")
crime <- st_as_sf(na.omit(crime), 
                        coords = c("longitude", "latitude"), 
                        crs = 4326, agr = "constant")%>%
          st_transform(st_crs(tracts17))%>%filter(report_year == '2017')
crime  <- crime[city_boundary,]

#MISDEMEANOR CRIMES
misdemeanor<- crime%>%
  filter(national_incident_based_crime_reporting_description %in% 
           c("DISORDERLY CONDUCT","SIMPLE ASSAULT","FALSE PRETENSES/SWINDLE/CONFIDENCE GAME","TRESPASS OF REAL PROPERTY",
              "CURFEW/LOITERING/VAGRANCY VIOLATION","POCKET-PICKING","DISORDERLY CONDUCT - DV","PURSE-SNATCHING",
              "FALSE PRETENSES/SWINDLE/CONFIDENCE","PEEPING TOM","SUSPICION","NOT REPOTABLE","NOT REPORTABLE","SHOPLIFTING",
              "INTIMIDATION","UNDER FALSE PRETENSES", "OTHER OFFENSES", "OTHER OFFENSE","ALL OTHER OFFENSES", "STOLEN PROPERTY OFFENSE", 
              "DESTRUCTION/DAMAGE/VANDALISM OF PROPERTY", "RUNAWAY", "COUNTERFEITING/FORGERY", "FAMILY OFFENSES, NONVIOLENT"))%>%
        mutate(Legend = 'misdemeanor')

#NON DRUG RELATED FELONY
felony<- crime%>%
  filter(!national_incident_based_crime_reporting_description %in% 
           c("DISORDERLY CONDUCT","SIMPLE ASSAULT","FALSE PRETENSES/SWINDLE/CONFIDENCE GAME","TRESPASS OF REAL PROPERTY",
             "CURFEW/LOITERING/VAGRANCY VIOLATION","POCKET-PICKING","DISORDERLY CONDUCT - DV","PURSE-SNATCHING",
             "FALSE PRETENSES/SWINDLE/CONFIDENCE","PEEPING TOM","SUSPICION","NOT REPOTABLE","NOT REPORTABLE","SHOPLIFTING",
             "INTIMIDATION","UNDER FALSE PRETENSES", "OTHER OFFENSES", "OTHER OFFENSE","ALL OTHER OFFENSES", "STOLEN PROPERTY OFFENSE", 
             "DESTRUCTION/DAMAGE/VANDALISM OF PROPERTY", "RUNAWAY", "COUNTERFEITING/FORGERY", "FAMILY OFFENSES, NONVIOLENT", 
             "DRUG/NARCOTIC VIOLATION","DRUG EQUIPMENT VIOLATION"))%>% mutate(Legend = 'felony')

#----CITY PROPERTIES----

#Parks 
parks<-na.omit(read.csv('data/Parks_Locations_And_Amenities.csv'))
parks<- st_as_sf(parks,coords = c("Longitude", "Latitude"),crs = 4326, agr = "constant")%>%
                         st_transform(st_crs(tracts17))%>%mutate(ID = 1:n(), Legend = "Park")%>%
                        dplyr::select(ID, Legend,geometry)
parks <- parks [city_boundary,]

#police and fire stations
mesa_police_fire<-st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/xms2-ya86.json")),
                            coords = c("longitude", "latitude"),
                            crs = 4326, agr = "constant")%>%
                 st_transform(st_crs(tracts17))%>%
                 filter(property_use %in% c("Public Safety--Fire/Police", "Park/Public Safety"))%>%
                 mutate(ID = 1:n(),Legend = "Police&Fire Station")%>%
                 dplyr::select(ID,Legend,geometry)


#vacant lots
vacant<- st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/xms2-ya86.json")),
                            coords = c("longitude", "latitude"),
                            crs = 4326, agr = "constant")%>%
         st_transform(st_crs(tracts17))%>%
         filter(property_use %in% c("Vacant", "Vacant (ADOT remnant)"))%>%
         mutate(ID = 1:n(), Legend = "Vacant Property")%>%
         dplyr::select(ID, Legend,geometry)


#child crisis center
cccenter<- st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/xms2-ya86.json")),
                  coords = c("longitude", "latitude"),
                  crs = 4326, agr = "constant")%>%
          st_transform(st_crs(tracts17))%>%
          filter(property_use %in% "Child Crisis Center")%>%
          mutate(ID = 1:n(),Legend = "Child Crisis Center")%>%
          dplyr::select(ID, Legend,geometry)

#arts and education centerS
arts_edu<- st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/xms2-ya86.json")),
                    coords = c("longitude", "latitude"),
                    crs = 4326, agr = "constant")%>%
           st_transform(st_crs(tracts17))%>%
           filter(property_use %in% c("Mesa Arts Center","Museums","Libraries","Sequoia Charter School"))%>%
           mutate(ID = 1:n(), Legend = "Arts and Education")%>%
           dplyr::select(ID, Legend,geometry)


#public housing
public_housing<- st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/xms2-ya86.json")),
                    coords = c("longitude", "latitude"),
                    crs = 4326, agr = "constant")%>%
                st_transform(st_crs(tracts17))%>%
                filter(property_use %in% c("Housing", "Escobedo Housing", "NSP", "Residential Property"))%>%
                mutate(ID = 1:n(), Legend = "Public Housing")%>%
                dplyr::select(ID, Legend,geometry)


#----ZONING----

#High Density residential
HD_Residential <- st_read("https://raw.githubusercontent.com/OliviaScalora/MUSA508_Final/main/Data/Zoning%20Districts.geojson")%>%
  st_transform(st_crs(tracts17))%>%
  filter(zoning %in% c("RM-3","RM-2","RM-4"))%>%
  dplyr::select(objectid, geometry)%>%mutate(zone = 'HD')

#Low Density Residential- "single residence" zoning
LD_Residential <- st_read("https://raw.githubusercontent.com/OliviaScalora/MUSA508_Final/main/Data/Zoning%20Districts.geojson")%>%
  st_transform(st_crs(tracts17))%>%
  filter(zoning %in% c("RS-15","RS-35","RS-43","RS-6","RS-90","RS-9","PC","RSL-2.5","RSL-4.5","RSL-2.5"))%>%
  dplyr::select(objectid, geometry)%>%mutate(zone = 'LD')

#Commercial
Commercial <- st_read("https://raw.githubusercontent.com/OliviaScalora/MUSA508_Final/main/Data/Zoning%20Districts.geojson")%>%
  st_transform(st_crs(tracts17))%>%
  filter(zoning %in% c("GC","LC","NC","OC","RSL-4.0"))%>%
  dplyr::select(objectid, geometry)%>%mutate(zone = 'com')

#Downtown
Downtown <- st_read("https://raw.githubusercontent.com/OliviaScalora/MUSA508_Final/main/Data/Zoning%20Districts.geojson")%>%
  st_transform(st_crs(tracts17))%>%
  filter(zoning %in% c("DC","DB-2","DB-1", "DR-2", "DR-3", "DR-1"))%>%
  dplyr::select(objectid, geometry)%>%mutate(zone = 'DT')

#Industrial
Industrial <- st_read("https://raw.githubusercontent.com/OliviaScalora/MUSA508_Final/main/Data/Zoning%20Districts.geojson")%>%
  st_transform(st_crs(tracts17))%>%
  filter(zoning %in% c("LI", "HI", "AG", "AG"))%>%
  dplyr::select(objectid, geometry)%>%mutate(zone = 'ind')


zoning<- rbind(Commercial, HD_Residential, LD_Residential, Downtown, Industrial)
zoning$zone <- factor(zoning$zone, levels = c('HD','LD','DT','com','ind'))




#ZonePlot1
ggplot()+
  geom_sf(data= zoning,
          aes(fill= zone),
          color = NA,)+
  geom_sf(data = opioid_data17%>%distinct(geometry, .keep_all = TRUE),
          aes(size = count),
          color = '#f26847', alpha = .8)+
  scale_fill_manual(values = c("#4281a4","#9cafb7", "#f26847", "#f6ad85", "#c3c1b8"),
                    labels=c('High Density Res','Low Density Res','Downtown','Commercial','Industrial')) +
  scale_size_continuous(breaks=seq(1, 4, by=1),
                        range = c(1,9))+
  geom_sf(data=city_boundary,color = '#473126',size = .5,fill = NA)+
  guides(size = guide_legend(override.aes = list(size = c(1,3,6,9))))+
  labs(size = 'Count at nearest\n 1/3 mi. interval', fill = 'Zone Type',
       caption = 'figure 2.',
       title = 'Mesa, AZ: 2017 Opioid Overdoses',
       subtitle= 'Total: 246')+
   mapTheme()+theme(panel.background = element_rect(fill = "#f0efeb", color = "#f0efeb"),
                   legend.background = element_rect(fill="#f0efeb", color = "#f0efeb"),
                   plot.background = element_rect(fill = "#f0efeb", color = "#f0efeb"),
                   legend.text = element_text(color='#2b2005'),
                   legend.title = element_text(color = '#2b2005'),
                   plot.title = element_text(color = '#2b2005', size = 22),
                   panel.grid = element_blank())


 #----MAKE FISHNET----

fishnet <- 
  st_make_grid(city_boundary,
               cellsize = 1760, 
               square = TRUE) %>%
  .[city_boundary] %>%  
  st_sf() %>%
  mutate(uniqueID = rownames(.))

#join overdoses to the fishnet
opioid_net <- 
  dplyr::select(opioid_data17) %>% 
  mutate(countoverdose = 1) %>% 
  aggregate(., fishnet, sum) %>%
  mutate(countoverdose = replace_na(countoverdose, 0),
         uniqueID = rownames(.),
         cvID = sample(round(nrow(fishnet) / 24), size=nrow(fishnet), replace = TRUE))
#opioid fishnet plot
ggplot() +
  geom_sf(data = opioid_net, aes(fill = countoverdose), color= NA)+
  scale_fill_viridis(option = 'F', direction = -1) +
  labs(title = "figure 3: Count of Overdoses for the fishnet",
       subtitle = "Mesa, AZ") +
  mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))


#----JOIN ZONING DATA TO FISHNET----

#Industrial
#Create polygon centroid
Ind_center<- st_centroid(Industrial)%>%mutate(Legend = "Industrial")


#Commercial
#Create polygon centroid
Comm_center<- st_centroid(Commercial)%>%mutate(Legend = "Commercial")


#low density residential
#Create polygon centroid
LDR_center<- st_centroid(LD_Residential)%>%mutate(Legend = "Low Density Residential")


# high density residential polygons 
#Create polygon centroid
HDR_center<- st_centroid(HD_Residential)%>%mutate(Legend = "High Density Residential")

#Create polygon centroid
DT_center<- st_centroid(Downtown)%>%mutate(Legend = "Downtown")

#join zones to fishnet + plot
zone_vars_net <- 
  rbind(DT_center, HDR_center, LDR_center, Comm_center, Ind_center) %>%
  st_join(., fishnet, join=st_within) %>%
  st_drop_geometry() %>%
  group_by(uniqueID, Legend) %>%
  summarize(count = n()) %>%
  full_join(fishnet) %>%
  spread(Legend, count, fill=0) %>%
  st_sf() %>%
  dplyr::select(-`<NA>`) %>%
  na.omit() %>%
  ungroup()

zone_vars_net.long <- 
  gather(zone_vars_net, Variable, value, -geometry, -uniqueID)
zone_vars <- unique(zone_vars_net.long$Variable)
zone_mapList <- list()

for(i in zone_vars){
  zone_mapList[[i]] <- 
    ggplot() +
    geom_sf(data = filter(zone_vars_net.long, Variable == i), aes(fill=value), colour=NA) +
    scale_fill_viridis(option = 'F', direction = -1) +
    labs(title=i) +
    mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))
}


do.call(grid.arrange,c(zone_mapList, ncol=2, top="figure 4.1: Density of Zone Type by Fishnet"))


#----JOIN CRIME DATA TO FISHNET----
crime_vars_net <- 
  rbind(misdemeanor, felony) %>%
  st_join(., fishnet, join=st_within) %>%
  st_drop_geometry() %>%
  group_by(uniqueID,Legend) %>%
  summarize(count = n()) %>%
  full_join(fishnet) %>%
  spread(Legend, count, fill=0) %>%
  st_sf() %>%
  dplyr::select(-`<NA>`) %>%
  na.omit() %>%
  ungroup()

crime_vars_net.long <- 
  gather(crime_vars_net, Variable, value, -geometry, -uniqueID)
crime_vars <- unique(crime_vars_net.long$Variable)
crime_mapList <- list()

for(i in crime_vars){
  crime_mapList[[i]] <- 
    ggplot() +
    geom_sf(data = filter(crime_vars_net.long, Variable == i), aes(fill=value), colour=NA) +
    scale_fill_viridis(option = 'F', direction = -1) +
    labs(title=i) +
    mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))
}

do.call(grid.arrange,c(crime_mapList, ncol=2, top="figure 4.2: CrimeRisk Factor by Fishnet"))


#----JOIN POINT VARIABLES TO FISHNET----

#Point data to fishnet
point_vars_net <- 
  rbind(arts_edu,cccenter,mesa_police_fire,parks,public_housing,vacant,parks, light_rail) %>%
  st_join(., fishnet, join=st_within) %>%
  st_drop_geometry() %>%
  group_by(uniqueID,Legend) %>%
  summarize(count = n()) %>%
  full_join(fishnet) %>%
  spread(Legend, count, fill=0) %>%
  st_sf() %>%
  dplyr::select(-`<NA>`) %>%
  na.omit() %>%
  ungroup()

point_vars_net.long <- 
  gather(point_vars_net, Variable, value, -geometry, -uniqueID)
point_vars <- unique(point_vars_net.long$Variable)
point_mapList <- list()

for(i in point_vars){
  point_mapList[[i]] <- 
    ggplot() +
    geom_sf(data = filter(point_vars_net.long, Variable == i), aes(fill=value), colour=NA) +
    scale_fill_viridis(option = 'F', direction = -1) +
    labs(title=i) +
    mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))
}


do.call(grid.arrange,c(point_mapList, ncol=2, top="figure 4.3: Risk Factor by Fishnet"))


 #----JOINING CENSUS VARIABLES TO FISHNET----

#Point data to fishnet
census_vars_net<- 
  rbind(high_vacancy_bg, low_med_rent_BG, low_inc_BG, white_bg, hisp_bg) %>%
  st_join(fishnet, ., join=st_intersects) %>%
  st_drop_geometry() %>%
  group_by(uniqueID,Legend) %>%
  summarize(count = n()) %>%
  full_join(fishnet) %>%
  spread(Legend, count, fill=0) %>%
  st_sf() %>%
  dplyr::select(-`<NA>`) %>%
  na.omit() %>%
  ungroup()

census_vars_net.long <- 
  gather(census_vars_net, Variable, value, -geometry, -uniqueID)
census_vars <- unique(census_vars_net.long$Variable)
census_mapList <- list()

for(i in census_vars){
  census_mapList[[i]] <- 
    ggplot() +
    geom_sf(data = filter(census_vars_net.long, Variable == i), aes(fill=value), colour=NA) +
    scale_fill_viridis(option = 'F', direction = -1) +
    labs(title=i) +
    mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))
}

#GRID SHOWS how many block groups each cell of fishnet intersects with that are defined as each variable. 
#max interesection is 4,5,and 6. 

do.call(grid.arrange,c(census_mapList, ncol=2, top="figure 4.4: Census Variable by Fishnet"))


#----JOIN POINT NN VARIABLES TO FISHNET----

#join nearest neighbor to fishnet
point_vars_net.nn <-
  point_vars_net %>%
  mutate(
    Park.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(parks),3),
    Child.Crisis.Center.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(cccenter),3),
    Arts.and.Education.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(arts_edu),3),
    Police.Fire.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(mesa_police_fire),3),
    Public.Housing.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(public_housing),3),
    Vacant.Property.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(vacant),3),
    Lightrail.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(light_rail),3))%>%
  dplyr::select(ends_with(".nn"),uniqueID, geometry)

point_vars_net.nn.long <- 
  gather(point_vars_net.nn, Variable, value, -geometry, -uniqueID)
point_vars.nn <- unique(point_vars_net.nn.long$Variable)
point_mapList.nn <- list()

for(i in point_vars.nn){
  point_mapList.nn[[i]] <- 
    ggplot() +
    geom_sf(data = filter(point_vars_net.nn.long, Variable == i), aes(fill=value), colour=NA) +
    scale_fill_viridis(option = 'F', direction = -1) +
    labs(title=i) +
    mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))
}

do.call(grid.arrange,c(point_mapList.nn, ncol=2, top="figure 4.5: Nearest Neighbor Risk Factors by Fishnet"))

 #----JOINING CENSUS VARIABLES TO FISHNET----


census_vars_net.nn <-
  census_vars_net %>%
  mutate(
    high_vacancy.nn =
      nn_function(st_coordinates(st_centroid(census_vars_net)), st_coordinates(st_centroid(high_vacancy_bg)),3),
    low_med_rent.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(st_centroid(low_med_rent_BG)),3),
    low_inc.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(st_centroid(low_inc_BG)),3),
    maj.hisp.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(st_centroid(hisp_bg)),3),
    maj.white.nn =
      nn_function(st_coordinates(st_centroid(point_vars_net)), st_coordinates(st_centroid(white_bg)),3))%>%
  dplyr::select(ends_with(".nn"),uniqueID, geometry)

census_vars_net.nn.long <- 
  gather(census_vars_net.nn, Variable, value, -geometry, -uniqueID)
census_vars.nn <- unique(census_vars_net.nn.long$Variable)
census_mapList.nn <- list()

for(i in census_vars.nn){
  census_mapList.nn[[i]] <- 
    ggplot() +
    geom_sf(data = filter(census_vars_net.nn.long, Variable == i), aes(fill=value), colour=NA) +
    scale_fill_viridis(option = 'F', direction = -1) +
    labs(title=i) +
    mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))
}
do.call(grid.arrange,c(census_mapList.nn, ncol=2, top="figure 4.6: Census Variable Nearest Neighbor by Fishnet"))

#----MERGE ALL VARIABLES TO FISHNET----

vars_net<- cbind(point_vars_net,(st_drop_geometry(zone_vars_net)%>%
                    dplyr::select(-uniqueID)))
vars_net<- cbind(vars_net,(st_drop_geometry(point_vars_net.nn))%>%
                   dplyr::select(-uniqueID))
vars_net<- cbind(vars_net,(st_drop_geometry(census_vars_net))%>%
                   dplyr::select(-uniqueID))
vars_net<- cbind(vars_net,(st_drop_geometry(census_vars_net.nn))%>%
                    dplyr::select(-uniqueID))
vars_net<- cbind(vars_net,(st_drop_geometry(crime_vars_net))%>%
                   dplyr::select(-uniqueID))

final_net <-
  left_join(opioid_net, st_drop_geometry(vars_net), by="uniqueID") 

final_net <-
  st_centroid(final_net) %>%
  st_join(dplyr::select(mesa_tracts17, GEOID), by = "uniqueID") %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(final_net, geometry, uniqueID)) %>%
  st_sf() %>%
  na.omit()


## generates warnings from PROJ issues
## {spdep} to make polygon to neighborhoods... 
final_net.nb <- poly2nb(as_Spatial(final_net), queen=TRUE)
## .. and neighborhoods to list of weigths
final_net.weights <- nb2listw(final_net.nb, style="W", zero.policy=TRUE)
#print(final_net.weights, zero.policy=TRUE)
## see ?localmoran
local_morans <- localmoran(final_net$countoverdose, final_net.weights, zero.policy=TRUE) %>% 
  as.data.frame()


# join local Moran's I results to fishnet

final_net.localMorans <- 
  cbind(local_morans, as.data.frame(final_net)) %>% 
  st_sf() %>%
  dplyr::select(overdose_count = countoverdose, 
                Local_Morans_I = Ii, 
                P_Value = `Pr(z != E(Ii))`) %>%
  mutate(Significant_Hotspots = ifelse(P_Value <= 0.001, 1, 0)) %>%
  gather(Variable, Value, -geometry)

vars <- unique(final_net.localMorans$Variable)
varList <- list()

for(i in vars){
  varList[[i]] <- 
    ggplot() +
    geom_sf(data = filter(final_net.localMorans, Variable == i), 
            aes(fill = Value), colour=NA) +
    scale_fill_viridis(option = "F", direction = -1) +
    labs(title=i) +
    mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))
  }

final_net <- final_net %>% 
  mutate(overdose.isSig = 
           ifelse(local_morans[,5] <= 0.001, 1, 0)) %>%
  mutate(overdose.isSig.dist = 
           nn_function(st_coordinates(st_centroid(final_net)),
                       st_coordinates(st_centroid(filter(final_net, 
                                           overdose.isSig == 1))),k = 1))
final_net.long <- 
  gather(final_net, Variable, value, -geometry, -uniqueID)


do.call(grid.arrange,c(varList, ncol = 2, top = "figure 5.1: Local Morans I statistics, Mesa, AZ Overdose Incidents"))
ggplot() +
      geom_sf(data = filter(final_net.long, Variable == "overdose.isSig.dist"), 
              aes(fill=as.numeric(value)), colour=NA) +
      scale_fill_viridis(option = 'F', direction = -1) +
      labs(fill = 'Distance (ft)',
           title= 'figure 5.2: Distance to Significant Overdose Hotspot',
           subtitle = 'Mesa, AZ') +
      mapTheme()+ theme(legend.position="bottom",
                      legend.key.width = unit(.8, 'cm'))

#----CORRELATION PLOTS----

#### for counts
correlation.long <-
  st_drop_geometry(final_net) %>%
  dplyr::select(-uniqueID, -cvID, -GEOID) %>%
  gather(Variable, Value, -countoverdose)

##### for nn
correlation.cor <-
  correlation.long %>%
  group_by(Variable) %>%
  summarize(correlation = cor(Value, countoverdose, use = "complete.obs"))

ggplot(correlation.long, aes(Value, countoverdose)) +
  geom_point(size = 1, color = '#223843') +
  geom_text(data = correlation.cor, aes(label = paste("r =", round(correlation, 2))),
            x=-Inf, y=Inf, vjust = 1.5, hjust = -.1) +
  geom_smooth(method = "lm", se = FALSE, colour = "black") +
  facet_wrap(~Variable, ncol = 5, scales = "free") +
  labs(title = "Narcotics Incidents count as a function of risk factors",
       caption = "figure 6.1") +
  plotTheme()+theme(strip.text.x = element_text(size = 10),
                    strip.background = element_rect(fill= '#f0efeb', color=NA))

numericVars <-
  select_if(st_drop_geometry(final_net), is.numeric) %>% na.omit()

ggcorrplot(
  round(cor(numericVars), 1),
  p.mat = cor_pmat(numericVars),
  colors = c("#40042d", "#f0efeb", "#f7580f"),
  type="lower",
  insig = "blank") +
  labs(title = "Correlation across numeric variables", 
       caption = 'figure 6.2')+theme(panel.background = element_rect(fill = "#f0efeb", color = "#f0efeb"),
                   legend.background = element_rect(fill="#f0efeb", color = "#f0efeb"),
                   plot.background = element_rect(fill = "#f0efeb", color = "#f0efeb"),
                   legend.text = element_text(color='#2b2005'),
                   legend.title = element_text(color = '#2b2005'),
                   plot.title = element_text(color = '#2b2005', size = 22),
                   panel.grid = element_blank(),
                   plot.margin= margin(1,1,1,1, unit = 'cm'))

final_net %>%
  ggplot(aes(countoverdose,))+
  geom_histogram(bins = 10, colour="black", fill = '#f69c73') +
  scale_x_continuous(breaks = seq(0, 20, by = 2)) + 
  scale_y_continuous(breaks = seq(0,1500, by = 200))+
  labs(title="Distribution of Overdose Occurances", subtitle = "Mesa, AZ",
       x="Overdose Occurance Count", y="Frequency", 
       caption = "figure 7.") +plotTheme()+theme(panel.background = element_rect(fill = "#f0efeb", color = "#f0efeb"),
                   legend.background = element_rect(fill="#f0efeb", color = "#f0efeb"),
                   plot.background = element_rect(fill = "#f0efeb", color = "#f0efeb"),
                   panel.border = element_blank(),
                   legend.text = element_text(color='#2b2005'),
                   legend.title = element_text(color = '#2b2005'),
                   plot.title = element_text(color = '#2b2005', size = 22),
                   panel.grid = element_blank())
## define the variables we want

#Just Risk Factors
reg.vars <- c("Commercial", "High.Density.Residential", "Low.Density.Residential",  "Downtown", 
                 "Park.nn", "Arts.and.Education.nn", "Police.Fire.nn", "Public.Housing", "Vacant.Property", "Low.Income", "Low.Rent",
                 "Majority.White", "Majority.Hispanic","Child.Crisis.Center", "Industrial", "misdemeanor", "felony", "lightrail")

#Include Local Moran's I Statistic
reg.ss.vars <- c("Commercial", "High.Density.Residential", "Low.Density.Residential", "Downtown", 
                 "Park.nn", "Arts.and.Education.nn", "Police.Fire.nn", "Public.Housing", "Vacant.Property", "Low.Income", "Low.Rent",
                 "Majority.White", "Majority.Hispanic","Child.Crisis.Center", "Industrial", "misdemeanor", "felony","lightrail", "overdose.isSig.dist", "overdose.isSig")


## RUN REGRESSIONS

#### K-fold
#Just Risk Factors
reg.CV <- crossValidate(
  dataset = final_net,
  id = "cvID",                           
  dependentVariable = "countoverdose",
  indVariables = reg.vars) %>%
  dplyr::select(cvID, countoverdose, Prediction, geometry)

#Spatial Process
reg.ss.CV <- crossValidate(
  dataset = final_net,
  id = "cvID",                           
  dependentVariable = "countoverdose",
  indVariables = reg.ss.vars) %>%
  dplyr::select(cvID, countoverdose, Prediction, geometry)


#### Spatial LOGO-CV
#Just Risk Factors
reg.spatialCV <- crossValidate(
  dataset = final_net,
  id = "GEOID",                           
  dependentVariable = "countoverdose",
  indVariables = reg.vars) %>%
  dplyr::select(cvID = GEOID, countoverdose, Prediction, geometry)


#Spatial Process
reg.ss.spatialCV <- crossValidate(
  dataset = final_net,
  id = "GEOID",                           
  dependentVariable = "countoverdose",
  indVariables = reg.ss.vars) %>%
  dplyr::select(cvID = GEOID, countoverdose, Prediction, geometry)

#### bind

reg.summary <- 
  rbind(
    mutate(reg.CV,        Error = Prediction - countoverdose,
           Regression = "Random k-fold CV: Just Risk Factors"),
    mutate(reg.ss.CV,        Error = Prediction - countoverdose,
           Regression = "Random k-fold CV: Spatial Process"),
    mutate(reg.spatialCV, Error = Prediction - countoverdose,
           Regression = "Spatial LOGO-CV: Just Risk Factors"),
    mutate(reg.ss.spatialCV, Error = Prediction - countoverdose,
           Regression = "Spatial LOGO-CV: Spatial Process")) %>%
  st_sf() 

# calculate errors by Block Group
error_by_reg_and_fold <- 
  reg.summary  %>%
  group_by(Regression, cvID) %>% 
  summarize(Mean_Error = mean(Prediction - countoverdose, na.rm = T),
            MAE = mean(abs(Mean_Error), na.rm = T),
            SD_MAE = mean(abs(Mean_Error), na.rm = T)) %>%
  ungroup()



#----HISTOGRAMS----
error_by_reg_and_fold %>% ggplot(aes(MAE))+
  geom_histogram(bins=30, color = 'black', fill = '#f69c73')+
  facet_wrap(~Regression)+
  geom_vline(xintercept= 0)+scale_x_continuous(breaks=seq(0,8,by=1))+
  labs(title='Distribution of MAE', subtitle='k-fold cross validation vs. LOGO-CV',
       x= 'Mean Absolute Erorr', y = 'Count', caption = 'figure 8.')+plotTheme()+theme(strip.text.x = element_text(size = 10),
                    strip.background = element_rect(fill= '#f0efeb', color=NA))

#-----SPATIAL PLOTS-----

ggplot() +
  geom_sf(data = reg.summary, aes(fill = Prediction, colour = Prediction)) +
  scale_fill_viridis(option = "F", direction = -1) +
  scale_colour_viridis(option = "F", direction = -1) +
  facet_wrap(~Regression) +  
  labs(title="Narcotics Incidents Errors", subtitle = "Random K-Fold and Spatial Cross Validation", caption = "figure 9.") +
  mapTheme()+theme(strip.text.x = element_text(size = 10),
                    strip.background = element_rect(fill= '#f0efeb', color=NA))
st_drop_geometry(error_by_reg_and_fold) %>%
  group_by(Regression) %>% 
  summarize(Mean_MAE = round(mean(MAE), 2),
            SD_MAE = round(sd(MAE), 2)) %>%
  kable() %>%kable_styling(html_font = 'helvetica')%>%
  kable_paper()%>%column_spec(c(1,2,3), background = "#f0efeb")%>%row_spec(0, background = "#f0efeb", color= 'black',hline_after = TRUE)

error_by_reg_and_fold%>%
  filter(str_detect(Regression, 'LOGO'))%>%
  ggplot()+
  geom_sf(aes(fill=MAE))+
  facet_wrap(~Regression)+
  scale_fill_viridis(option = "F", direction = -1) +
  labs(title='Distribution of MAE', subtitle='k-fold cross validation vs. LOGO-CV',
       x= 'Mean Absolute Erorr', y = 'Count', caption = 'figure 10.')+mapTheme()+theme(strip.text.x = element_text(size = 10),
                    strip.background = element_rect(fill= '#f0efeb', color=NA))

#----NEIGHBORHOOD WEIGHTS-----
  
neighborhood.weights <-
  filter(error_by_reg_and_fold, Regression == "Spatial LOGO-CV: Spatial Process") %>%
  group_by(cvID) %>%
  poly2nb(as_Spatial(.), queen=TRUE) %>%
  nb2listw(., style="W", zero.policy=TRUE)

filter(error_by_reg_and_fold, str_detect(Regression, "LOGO"))  %>% 
  st_drop_geometry() %>%
  group_by(Regression) %>%
  summarize(Morans_I = moran.mc(abs(Mean_Error), neighborhood.weights, 
                                nsim = 999, zero.policy = TRUE, 
                                na.action=na.omit)[[1]],
            p_value = moran.mc(abs(Mean_Error), neighborhood.weights, 
                               nsim = 999, zero.policy = TRUE, 
                               na.action=na.omit)[[3]])%>%
  kable() %>%kable_styling(html_font = 'helvetica')%>%
  kable_paper()%>%column_spec(c(1,2,3), background = "#f0efeb")%>%row_spec(0, background = "#f0efeb", color= 'black',hline_after = TRUE)


grid.arrange(
  reg.summary%>%filter(Regression == "Spatial LOGO-CV: Just Risk Factors")%>%
    ggplot() + geom_sf(aes(fill = Prediction, colour = Prediction))+
    scale_fill_viridis(option = "F", direction = -1) +
    scale_colour_viridis(option = "F", direction = -1)+
    labs(title='Spatial LOGO-CV: \nJust Risk Factors',
         caption = 'figure 11.')+mapTheme()+theme(panel.border=element_blank(),
                                                      title = element_text(size = 12),
                                                                      legend.position = "bottom",
                                                                      legend.key.size = unit(.5, 'cm')),
  reg.summary%>%filter(Regression == "Spatial LOGO-CV: Spatial Process")%>%
    ggplot() + geom_sf(aes(fill = Prediction, colour = Prediction))+
    scale_fill_viridis(option = "F", direction = -1) +
    scale_colour_viridis(option = "F", direction = -1)+

    labs(title='Spatial LOGO-CV: \nSpatial Process')+mapTheme()+theme(panel.border=element_blank(),
                                                      title = element_text(size = 12),
                                                                    legend.position = "bottom",
                                                                    legend.key.size = unit(.5, 'cm')),
  ggplot() +
    geom_sf(data = reg.summary, aes(fill = countoverdose, colour = countoverdose))+
    scale_fill_viridis(option = "F", direction = -1) +
    scale_colour_viridis(option = "F", direction = -1)+
    labs(title='Observed \nOverdoses')+mapTheme()+theme(panel.border=element_blank(),
                                                      title = element_text(size = 12),
                                                      legend.position = "bottom",
                                                      legend.key.size = unit(.5, 'cm')), nrow=1)

st_drop_geometry(reg.summary) %>%
  group_by(Regression) %>%
  mutate(overdose_Decile = ntile(countoverdose, 10)) %>%
  group_by(Regression, overdose_Decile) %>%
  summarize(meanObserved = mean(countoverdose, na.rm=T),
            meanPrediction = mean(Prediction, na.rm=T)) %>%
  gather(Variable, Value, -Regression, -overdose_Decile) %>%          
  ggplot(aes(overdose_Decile, Value, shape = Variable)) +
  geom_point(size = 2) + geom_path(aes(group = overdose_Decile), colour = "black") +
  scale_shape_manual(values = c(2, 17)) +
  facet_wrap(~Regression) + xlim(0,10) +
  labs(title = "Predicted and observed burglary by observed burglary decile",
       caption = 'figure 12.')  +
  plotTheme()+theme(strip.text.x = element_text(size = 10),
                    strip.background = element_rect(fill= '#f0efeb', color=NA))

#Test  generalizability across block group
#Get 2018 Data
blockgroups19 <- get_acs(geography = "block group", variables = c("B01003_001","B03002_012"), 
                         year=2019, state=04, county=013, geometry=T) %>% 
  st_transform('EPSG:2224')  %>% 
  dplyr::select(variable, estimate, GEOID) %>%
  spread(variable, estimate) %>%
  rename(TotalPop = B01003_001,
         HispTotal = B03002_012) %>%
  mutate(Pct.Hisp = HispTotal/TotalPop,
         raceContext = ifelse(Pct.Hisp > .5, "Majority_Hispanic", "Majority_Non_Hispanic")) %>%
  .[mesa_tracts17,]


reg.summary %>%
  st_centroid() %>%
  st_join(blockgroups19) %>%
  na.omit() %>%
  st_drop_geometry() %>%
  group_by(Regression, raceContext) %>%
  summarize(mean.Error = mean(Error, na.rm = T)) %>%
  spread(raceContext, mean.Error) %>%
  kable() %>%kable_styling(html_font = 'helvetica')%>%
  kable_paper()%>%column_spec(c(1,2,3), background = "#f0efeb")%>%row_spec(0, background = "#f0efeb", color= 'black',hline_after = TRUE)
opioid_data18 <- opioid_data%>%filter(year=='2018')

#kernel density plot 2017-2021
p1<- ggplot()+
  geom_sf(data=city_boundary, fill='#22192b', color='black')+
  stat_density2d(data= data.frame(st_coordinates(opioid_data17)),
                 aes(X,Y, fill=..level..,alpha=..level..),
                 size=0.01, bins=40, geom='polygon')+
  scale_fill_viridis(option = 'F',  direction = 1) +
  scale_alpha(range=c(0.00,0.35), guide='none')+
  geom_sf(data = opioid_data17%>%distinct(geometry, .keep_all=TRUE),
          aes(size = count),
          color = '#fffaf2',
          alpha = .5)+
  guides(size = guide_legend(override.aes = list(color = '#2f273d', alpha=.2)))+
  labs(subtitle = '2017',
       fill = 'Density',
       size = 'Count at nearest\n 1/3 mi. interval',
       caption = 'figure 13.')+
  scale_size_continuous(breaks=seq(1, 61, by=15),
                        range = c(1,10))+ mapTheme()+theme(legend.key.size = unit(.65, 'cm'))

p2<- ggplot()+
  geom_sf(data=city_boundary, fill='#22192b', color='black')+
  stat_density2d(data= data.frame(st_coordinates(opioid_data18)),
                 aes(X,Y, fill=..level..,alpha=..level..),
                 size=0.01, bins=40, geom='polygon')+
  scale_fill_viridis(option = 'F',  direction = 1) +
  scale_alpha(range=c(0.00,0.35), guide='none')+
  geom_sf(data = opioid_data18%>%distinct(geometry, .keep_all=TRUE),
          aes(size = count),
          color = '#fffaf2',
          alpha = .5)+
  scale_size_continuous(breaks=seq(1, 61, by=15),
                        range = c(1,10))+ 
  guides(size = guide_legend(override.aes = list(color = '#2f273d', alpha=.2)))+
  labs(subtitle = '2018',
       fill = 'Density',
       size = 'Count at nearest\n 1/3 mi. interval')+
  mapTheme()

p3<- ggplot()+
  geom_sf(data=city_boundary, fill='#22192b', color='black')+
  stat_density2d(data= data.frame(st_coordinates(opioid_data17)),
                 aes(X,Y, fill=..level..,alpha=..level..),
                 size=0.01, bins=40, geom='polygon')+
  scale_fill_viridis(option = 'F',  direction = 1) +
  scale_alpha(range=c(0.00,0.35), guide='none')+
  labs(subtitle = 'Kernel Density 2017')+ mapTheme()

p4<- ggplot()+
  geom_sf(data=city_boundary, fill='#22192b', color='black')+
  stat_density2d(data= data.frame(st_coordinates(opioid_data18)),
                 aes(X,Y, fill=..level..,alpha=..level..),
                 size=0.01, bins=40, geom='polygon')+
  scale_fill_viridis(option = 'F',  direction = 1) +
  scale_alpha(range=c(0.00,0.35), guide='none')+
  labs(subtitle = 'Kernel Density 2018')+ mapTheme()

mylegend<-g_legend(p1)

grid.arrange(p3 + theme(legend.position="none"),
             p4 + theme(legend.position="none"),
             mylegend, 
             p1 + theme(legend.position="none"),
             p2 + theme(legend.position="none"),
             nrow=2,ncol=3,widths=c(2,2,.5), top = "Opioid Overdose Occurances in Mesa, AZ")
  


# In this section, we ask whether risk predictions outperform traditional ‘Kernel density’ hotspot mapping. 
# To add an element of across-time generalizability, 
# hotspot and risk predictions from these 2017 burglaries are used to predict the location of burglaries from 2018.

#Goodness of fit indicator - illustrate whether the 2017 kernel density or risk predictions capture
#more of the 2018 burglaries. If the risk predictions capture more observed burglaries than the kernel density,
#then the risk prediction model provides a more robust targeting tool for allocating police resources.

#Step1 - 2018 opioid data 
opioid_data18 <- opioid_data%>%filter(year=='2018')

# Step 2
#Kernel Density for 2017 data
opioid_ppp <- as.ppp(st_coordinates(opioid_data), W = st_bbox(final_net))
opioid_KD <- density.ppp(opioid_ppp, 1000)

opioid_KDE_sf <- as.data.frame(opioid_KD) %>%
  st_as_sf(coords = c("x", "y"), crs = st_crs(final_net)) %>%
  aggregate(., final_net, mean) %>%
  mutate(label = "Kernel Density",
         Risk_Category = ntile(value, 100),
         Risk_Category = case_when(
           Risk_Category >= 90 ~ "90% to 100%",
           Risk_Category >= 70 & Risk_Category <= 89 ~ "70% to 89%",
           Risk_Category >= 50 & Risk_Category <= 69 ~ "50% to 69%",
           Risk_Category >= 30 & Risk_Category <= 49 ~ "30% to 49%",
           Risk_Category >= 1 & Risk_Category  <= 29 ~ "1% to 29%")) %>%
  cbind(
    aggregate(
      dplyr::select(opioid_data18) %>% mutate(overdoseCount = 1), ., sum) %>%
      mutate(overdoseCount = replace_na(overdoseCount, 0))) %>%
  dplyr::select(label, Risk_Category, overdoseCount)

# Step 3
opioid_risk_sf <-
  filter(reg.summary, Regression == "Spatial LOGO-CV: Spatial Process") %>%
  mutate(label = "Risk Predictions",
         Risk_Category = ntile(Prediction, 100),
         Risk_Category = case_when(
           Risk_Category >= 90 ~ "90% to 100%",
           Risk_Category >= 70 & Risk_Category <= 89 ~ "70% to 89%",
           Risk_Category >= 50 & Risk_Category <= 69 ~ "50% to 69%",
           Risk_Category >= 30 & Risk_Category <= 49 ~ "30% to 49%",
           Risk_Category >= 1 & Risk_Category <= 29 ~ "1% to 29%")) %>%
  cbind(
    aggregate(
      dplyr::select(opioid_data18) %>% mutate(overdoseCount = 1), ., sum) %>%
      mutate(overdoseCount = replace_na(overdoseCount, 0))) %>%
  dplyr::select(label, Risk_Category, overdoseCount)

# Step 4
rbind(opioid_KDE_sf , opioid_risk_sf) %>%
  na.omit() %>%
  gather(Variable, Value, -label, -Risk_Category, -geometry) %>%
  ggplot() +
  geom_sf(aes(fill = Risk_Category), colour = NA) +
  geom_sf(data = opioid_data18%>%distinct(geometry, .keep_all=TRUE),
          aes(size = count),
          color = '#000000',
          alpha = .65)+
  scale_size_continuous(breaks=seq(1, 75, by=15),
                        range = c(1,6))+
  guides(size= guide_legend(title = "2018 Overdoses",override.aes = list(size = c(1,2,4,5,6),color = '#424453',alpha=.45)))+
  facet_wrap(~label, ) +
  scale_fill_viridis(option = "F", direction =-1, discrete = TRUE, alpha = .75) +
  labs(title="Comparison of Kernel Density and Risk Predictions",
       subtitle="2017 Overdose Risk Predictions; 2018 Overdose", caption = "figure 14.") +
  mapTheme()+theme(strip.text.x = element_text(size = 10),
                    strip.background = element_rect(fill= '#f0efeb', color=NA))

#calculates the rate of 2018 overdose points by risk category and model type
# well fit model should show that the risk predictions capture a greater share of 2018 overdoses
# in the highest risk category relative to the kernel density

rbind(opioid_KDE_sf , opioid_risk_sf) %>%
  st_set_geometry(NULL) %>% na.omit() %>%
  gather(Variable, Value, -label, -Risk_Category) %>%
  group_by(label, Risk_Category) %>%
  summarize(countoverdose = sum(Value)) %>%
  ungroup() %>%
  group_by(label) %>%
  mutate(Rate_of_test_set_overdose = countoverdose / sum(countoverdose)) %>%
  ggplot(aes(Risk_Category,Rate_of_test_set_overdose)) +
  geom_bar(aes(fill=label), position="dodge", stat="identity") +
  scale_fill_manual(values=c('#f69c73', "#721f58"))+
  labs(title = "Risk prediction vs. Kernel density, 2018 burglaries",
       caption = 'figure 15.') +
  plotTheme() + theme(axis.text.x = element_text(angle = 45, vjust = 0.5))

# JOIN PREDICTIONS TO ZONE TYPE
reg.summary.logo<-reg.summary%>% filter(Regression == "Spatial LOGO-CV: Spatial Process")%>%
  mutate(abs_Error = abs(Error))

grid.arrange(
#LOGO CV Spatial Predictions
reg.ss.spatialCV%>%
  ggplot() + geom_sf(aes(fill = Prediction, colour = Prediction))+
  scale_fill_viridis(option = "F", direction = -1) +
  scale_colour_viridis(option = "F", direction = -1)+
  labs(title='Predictions',subtitle='Spatial LOGO-CV: Spatial Process', caption = 'figure 16.1')+
  mapTheme()+theme(panel.border=element_blank(),legend.position = "bottom",legend.key.size = unit(.5, 'cm')),

#LOGO CV Spatial Errors
ggplot() +
  geom_sf(data = reg.summary.logo, aes(fill = abs_Error, colour = abs_Error)) +
  scale_fill_viridis(option = "F", direction = -1) +
  scale_colour_viridis(option = "F", direction = -1) +
  labs(title = 'Absolute Errors', subtitle='Spatial LOGO-CV: Spatial Process', caption = 'figure 16.2') +
  mapTheme()+theme(panel.border=element_blank(),legend.position = "bottom",legend.key.size = unit(.5, 'cm')), nrow=1)

#join downtown and commercial zoning districts

site<- rbind(Commercial, Downtown)
site.second<- rbind(HD_Residential, LD_Residential)
# reg.ss.spatialCV%>%
#   ggplot() +
#   geom_sf(data=site, fill=NA, color = 'black', size = .5, alpha = .5)+ 
#   geom_sf(aes(fill = Prediction, colour = Prediction), alpha=.5)+
#   geom_sf(data=city_boundary, fill = NA)+
#   scale_fill_viridis(option = "F", direction = -1) +
#   scale_colour_viridis(option = "F", direction = -1)+
#   labs(title='Predictions',subtitle='Spatial LOGO-CV: Spatial Process')+
#   mapTheme()+theme(panel.border=element_blank(),legend.position = "bottom",legend.key.size = unit(.5, 'cm'))
OOTC.site<- st_join(site, reg.ss.spatialCV, join=st_intersects)%>%filter(Prediction > 1)
OOTC.site.second<- st_join(site.second, reg.ss.spatialCV, join=st_intersects)%>%filter(Prediction > 1)

ggplot() +
  geom_sf(data= zoning,
          fill= 'grey90',
          color = NA,)+
  geom_sf(data =city_boundary, fill = NA, color ='black')+
  # geom_sf(data = mesa_tracts17, fill = NA, color ='grey75', size = .25)+
  geom_sf(data=OOTC.site, aes(fill = Prediction), color =NA)+
  scale_fill_viridis(option = "F", direction = -1) +
  geom_sf(data=OOTC.site.second, fill = 'lightblue', color =NA)+
  labs(title = 'Commercial and Downtown Zones with High Prediction Value', caption = 'figure 17.') +
  mapTheme()